Detecting Commas in Slovak Legal Texts

نویسندگان

  • Róbert Sabo
  • Stefan Benus
چکیده

This paper reports on initial experiments with automatic comma recovery in legal texts. In deciding whether to insert a comma or not, we propose to use the value of the probability of a bigram of two words without a comma and a trigram of the words with the comma. The probability is determined by the language model trained on sentences with commas labeled as separate words. In the training database one sentence corresponds to one line. The thresholds of bigrams and trigrams probability were experimentally determined to achieve the best balance of precision and recall. The advantage of the proposed method is its high precision (95%) at a relatively satisfactory recall (49%). For judges as potential users of an ASR system with an automatic comma insertion function, precision is particularly important.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Slovak Automatic Dictation System for Judicial Domain

This paper describes the design, development and evaluation of the Slovak dictation system for the judicial domain. The speech is recorded using a close-talk microphone and the dictation system is used for on-line or off-line automatic transcription. The system provides an automatic dictation tool in Slovak for the employees of the Ministry of Justice of the Slovak Republic and all the courts i...

متن کامل

Automatic Text Formatting for Social Media based on Linefeed and Comma Insertion

By appearance of social media, people are coming to be able to transmit information easily on a personal level. However, because users of social media generally spend little time on describing information, low-quality texts are transmitted and it blocks the spread of information. On transmitted texts in social media, commas and linefeeds are inserted incorrectly, and it becomes a factor of low-...

متن کامل

Automatic Comma Insertion for Japanese Text Generation

This paper proposes a method for automatically inserting commas into Japanese texts. In Japanese sentences, commas play an important role in explicitly separating the constituents, such as words and phrases, of a sentence. The method can be used as an elemental technology for natural language generation such as speech recognition and machine translation, or in writing-support tools for non-nati...

متن کامل

The Similarity Detection in Slovak Texts by Compression Method

This paper deals with similarity and plagiarism with a focus on the Slovak texts. It presents and analyzes standard methods and tools used to detect plagiarism in order to use the conclusions of its own solutions. It explains the principle of dictionary method for data compression known as the Lempel-Ziv, which idea of creating the dictionary is used as the basis for our method proposal to dete...

متن کامل

The (re)presentation of the Author in Czech and Slovak Scientific Texts

This paper poses the question of how academic writers present themselves to the audience and focuses on the functions of forms of self-reference in Czech and Slovak scientific discourse. For scientific texts, the Latin rhetoric tradition recommended the so called pluralis modestiae or pluralis auctoris as an appropriate linguistic means of self-presentation of the writer, conveying his modest a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014